
This repository contains the code and data for reproducing:

"Python Libraries and the Shape of Collective Attention: Cybersecurity and the Rise of AI"
by Kaylea Champion and Ellie Ross

Conducted for partial fulfillment of the course requirements for CS&SS 592 A, Applied Longitudinal Analysis
Instructor: Dr. Elena Erosheva
University of Washington


1. If you would like to download fresh versions of the dataset rather than use the versions we provided, you may obtain them from:

	- Combined download figures for python: http://www.lesfleursdunormal.fr
	- CVE data from NIST: https://nvd.nist.gov/
	- Metadata via a Google BigQuery:   SELECT distinct name FROM `bigquery-public-data.pypi.distribution_metadata` where CONTAINS_SUBSTR(description, 'gpt') or CONTAINS_SUBSTR(description, 'machine learning') or CONTAINS_SUBSTR(description, 'artificial intelligence') or CONTAINS_SUBSTR(description, 'llm') or CONTAINS_SUBSTR(summary, 'gpt') or CONTAINS_SUBSTR(summary, 'machine learning') or CONTAINS_SUBSTR(summary, 'artificial intelligence') or CONTAINS_SUBSTR(summary, 'llm') or CONTAINS_SUBSTR(keywords, 'gpt') or CONTAINS_SUBSTR(keywords, 'ai') or CONTAINS_SUBSTR(keywords, 'machine learning') or CONTAINS_SUBSTR(keywords, 'artificial intelligence') or CONTAINS_SUBSTR(keywords, 'llm') or CONTAINS_SUBSTR(classifiers,"Topic :: Scientific/Engineering :: Artificial Intelligence")}

Note that the CVE raw dataset was filtered using the command grep pypi cve.csv > cve_clean.csv

2. Update the paths at the top of prepAndClean.R and models.R to match your needs. Note that some values are saved out into a file for knitr to compile into the .Rtex of the paper itself.

3. Run prepAndClean.R

4. Run models.R

5. Optionally, run diagnostics.R

Please let us know if you have questions!
